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ABSTRACT 

The  problem  of  partitioning  time-series  into  segments  is  treated. 

The  segments  are  considered  as  falling  into  classes.  A  different 

probability  distribution  is  associated  with  each  class  of  segment. 

Parametric  families  of  distributions  are  considered,  a  set  of  parameter 

values  being  associated  with  each  class.  With  each  observation  is 

associated  an  unobservable  label,  indicating  from  which  class  the 

observation  arose.  The  label  process  is  modeled  as  a  Markov  chain. 

Segmentation  algorithms  are  obtained  by  applying  a  relaxation  method  to 

maximize  the  resultingrl ikel ihood  function.  In  this  paper  special 

In¬ 
attention  is  given  to  the  situation  in  which  the  observations  are 

conditionally  independent,  given  the  labels.  A  numerical  example, 

segmentation  of  U.S.  Gross  National  Product,  is  given.  Choice  of  the 

number  of  classes,  using  statistical  model-selection  criteria,  is 

i I  lustrated. 
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TIME-SERIES  SEGMENTATION:  A  MODEL  AND  A  METHOD 

STANLEY  L.  SCLOVE 
University  of  Illinois  at  Chicago 

!♦  Introduction 

The  problem  of  segmentation  considered  here  is:  Given  a  time  series 

{xt,  t  -  1, ....  n>, 

partition  the  set  of  values  of  t  into  segments  (sub-series,  regimes) 
within  which  the  behavior  of  xt  is  homogeneous.  The  segments  are 
considered  as  falling  into  several  classes. 

The  observation  X  may  be  a  scalar,  vector,  or  matrix  —  any  element 
of  a  linear  space,  for  which  the  operations  of  addition  and  scalar  multipli¬ 
cation  are  defined.  (If  X  is  a  scalar,  operations  such  as  xt  -  cxt.j, 
where  c  is  a  scalar,  are  required.  If  X  is  a  vector  or  matrix,  the 
operation  xt  -  Cxt_j,  where  C  is  a  matrix,  is  required.) 

2.  The  Model 

/ 

One  can  imagine  a  series  which  is  usually  relatively  smooth  but  occa¬ 
sionally  rather  jumpy  as  being  composed  of  sub-series  which  are  first- 
order  autoregressive,  the  autocorrelation  coefficient  being  positive  for 
the  smooth  segments  and  negative  for  the  jumpy  ones.  One  might  try 
fitting  such  data  with  a  segmentation  of  two  classes,  one  corresponding 
to  a  positive  autocorrelation,  the  other,  to  a  negative  autocorrelation. 

The  mechanism  generating  the  process  changes  from  time  to  time,  and 
these  changes  manifest  themselves  at  some  unknown  time  points  (epochs, 
change-points) .  The  number,  say  m,  of  segments  and  the  epochs  are 
unknown.  Generally  there  will  be  fewer  than  m  generating  mechanisms. 
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The  number  of  mechanisms  (classes)  will  be  denoted  by  k;  it  w. II  be 
assumed  that  k  is  at  most  m.  In  some  situations,  k  is  specified; 
in  others,  it  is  not.  Estimation  of  k  will  be  considered.  Although 
the  process  changes  from  time  to  time,  it  should  be  stationary  in  the 
mean  if  it  is  to  be  segmented.  Otherwise,  one  would  merely  be  fitting 
the  drifting  level  of  the  process,  and  the  larger  the  series  length, 
the  larger  the  value  of  k  that  would  be  required.  Thus,  series  must 
be  differenced  to  achieve  stationarity  before  applying  segmentation 
techniques. 

With  the  c-th  class  is  associated  a  stochastic  process,  Pc,  say. 
E.o. .  above  we  spoke  of  a  situation  with  k  •  2  classes,  where,  for 
c  ■  1,2,  the  process  Pc  is  first-order  autoregressive  with  coefficient  4>c, 
where  is  positive  and  4>2  is  negative. 

Mow  with  the  t-th  observation  (t  *  1 . n)  associate  the  label 

/ 

Yt,  which  is  equal  to  c  if  and  only  if  xt  arose  from  class  c, 
c  -  1,...,  k.  Each  time-point  t  gives  rise  to  a  pair 

(xt.Tt), 

where  xt  is  observable  and  is  not.  The  process  (xt)  is  the 

observed  time  series;  the  process  { y t)  will  be  called  the  "label 
process." 

Define  a  segmentation,  then,  as  a  partition  of  the  time  index  set 
{t:  t  ■  1,...,  n}  into  subsets 

S1  *  ^ . tl^*  s2  *  (t » - • - , t2} .  •••.  Sm  »  {tm_j+J,...,  n) , 

where  the  t's  are  subscripted  in  ascending  order.  Each  subset  S„, 
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g  ■  1,...,  m,  is  a  segment.  The  integer  m  is  not  specified.  In  the 
context  of  this  model,  to  segment  the  series  is  merely  to  estimate  the  y‘s. 

The  focus  in  the  present  paper  is  not  on  the  change-points  tj, 
i  ■  1,...,m.  Rather,  the  idea  underlying  the  development  here  is  that 
of  transi tions  between  classes.  The  labels  will  be  treated 
as  random  variables  Tt  with  transition  probabilities 

Pr  (I^-dlr^-c)  -  pcd, 

taken  as  stationary,  £.e. ,  independent  of  t.  The  k-by-k  matrix  of 
transition  probabilities  will  be  denoted  by  P,  !•£., 

P  »  Cpcd]  • 

Restrictions  on  the  process  can  be  imposed  by  setting  the  appropriate 
transition  probabilities  equal  to  zero.  £•£* ,  some  processes  are  strictly 
cyclic,  such  as  the  operation  of  an  internal -combust ion  engine,  with  its 

cycle  of  intake  to  compression  to  combustion  to  exhaust  to  intake,  etc. 

/ 

Similarly,  one  might  wish  to  describe  the  economy  in  terms  of  transitions 
from  recession  to  recovery  to  expansion,  not  allowing  transition  directly 
from  recession  to  expansion. 

Segmentation  will  involve  the  simultaneous  estimation  of  several  sets 
of  parameters,  the  distributional  parameters  of  the  wi thin-class  stochastic 
processes,  the  transition  probabilities,  and  the  labels.  In  order  to 
develop  a  procedure  for  maximum  likelihood  estimation,  obviously  the 
likelihood  must  first  be  obtained. 

To  do  this,  note  that  a  joint  probability  density  function  (p.d.f.) 
for  the  whole  process  {(Xt,ft),  t  ■  1,...,  n}  can  be  obtained  by 
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successively  conditioning  each  variable  on  all  the  preceding  ones.  The 
label  y  is  considered  as  preceding  the  corresponding  observation  X. 
The  variable 


X]  is  conditioned  on  T^j 


r2. 

on 

Xj  and  r, ; 

x2 

on 

X,.  and 
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9 

r3* 

on 

X2»  T2«  Xf,  and 

I’ll 

x3. 

on 

Tj,  X2#  T2»  Xj, 

and 

r,; 

etc. 

Using 

f 

as 

a  generic  symbol 

for 

any  p.d 

.f.,  this  leads 

to  the 

joint 

p.d.f 

• 

(2.1) 

f  (r,3 

If  (x 

n 
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9  •  •  •  1 
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|Tt‘xt-l *Tt-] •  •  < 

■  •  »T]) 

t-2 


The  working  assumptions  of  this  paper  are  the  following. 

(A. 1)  The  label  process  { y t}  is  a  first-order  Markov  chain, 
homogeneous  in  /the  sense  of  having  stationary  transition 
probabilities,  and  conditionally  independent  of  the  observations; 
i*e. , 

(2.2)  f(Tt|xt,Tt-i . X |  «Yj)  ■  f(rtlTt-l)* 

When  Yt_i  ■  c  and  *yt  ■  d,  then 
f(7t|Yt-i)  -  Pcd’ 

and  these  transition  probabilities  do  not  depend  upon  t. 

(The  first-order  assumption  is  not  critical.) 

(A. 2)  The  distribution  of  the  random  variable  Xt  depends  only 
upon  its  own  label  and  previous  X's,  not  previous  labels: 

(2*3)  f (xt ,^t*l *  *  *  *  *Xj ,^])  "  f  (X; | Vt»Xt-l , • • • «X |)  • 
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With  these  assumptions  (2.1)  becomes 

n 

(2.4)  f  (Tj)f  (xj  |7j)  n  Prt.17tf  (xt|TfXt-l . xl)> 

Note  that  this  is 

k  k  n 

(2.5)  (n  n  pedncd)  f (ti) f (x] ly^  n  f (xt|yt,xt_1 . xj), 

c«l  d«1  t“2 

where  the  (unobservable)  quantity  ncsj  is  the  number  of  transitions 
from  class  c  to  class  d. 

This  model,  with  transition  probabilities,  has  certain  advantages  over 
a  model  based  on  the  change-points.  The  change-points  are  discrete 
parameters,  and,  even  if  the  corresponding  generalized  likelihood  ratio  were 
asymptotically  chi-square,  the  number  of  degrees  of  freedom  would  not  be 
clear.  On  the  other  hand,  the  transition  probabilities  vary  in  an  interval 

and  it  is  clear  that  they  constitute  a  set  of  k(k-l)  free  parameters. 

/ 

Examples,  (i)  If  each  class-conditional  process  Pc  is  a 
first-order  Markov  process,  then 

(2.6)  f  (xt|7t.xt_1,...,x1)  -  f  (xt|Yt,xt.j) . 

(ii)  If  in  addition  the  c-th  class-conditional  process  is  Gaussian 
first-order  autoregressive  with  autoregression  coefficient  <t>c 
and  constant  3C,  with  common  <72,  then  (2.6)  holds  with 

f  (xtlTt*c.xt-l)  ■  (27TC72)*1/2  exp[-utc2/(2<72)], 

where 

utc  *  *t  -  <*cxt-l  +  5c>  • 

E.q..  the  value  of  the  likelihood  for 

Yl  •  1  *  >2  “  •••  "  Tr  and  yr+1  ■  2  ■  yr+2  “  •••  ■  Tn 


r* 


r 


r 


r 


I 


9 


is,  for  given  xq, 
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Pi  im”1Pl2P22n’m"2  (2,r0,2)  ~  /2exp[-q/  (2o2)  ]  , 

where 

r  n 

q  »  2[xt  -  +  S])]2  +  2[xt  -  (4>2*t-l  +  52^2  • 

t-1  t-r+1 

In  regard  to  (A. 2) ,  in  the  simplest  case  the  X's  are  (conditionally) 
independent,  given  the  labels.  That  is,  the  distribution  of  Xt 
depends  only  upon  its  label,  and  not  previous  X's.  Then 

f  (xtlrt.xt.j . xj.Yj)  -  f(xt|yt). 

We  shall  pay  special  attention  to  this  case  in  the  present  paper.  In  this 
case  the  p.d.f.'s  f(x|yt~c),  c  *  1,...,  k,  are  called  class-conditional 
densities.  In  the  parametric  case  the  class-conditional  density  takes 
the  form 

* 

(2-7)  f(xt|yt»c)  -  g(xt;3c), 

where  3  is  a  parameter  indexing  a  family  of  p.d.f.'s  of  form  given 
by  the  function  g  and  3C  is  its  value  for  the  c-th  class.  For 
example,  in  the  case  of  Saussian  class-conditional  distributions  3C 
consists  of  the  mean  and  variance  for  the  c-th  class. 

3.  An  Algorithm 

3.1.  Development  of  the  algorithm 

The  1 ikel i hood  l  is  (2.5),  considered  as  a  function  of  the 


parameters,  for  fixed  {xtJ .  From  (2.5)  and  (2.7),  the  likelihood  L 
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can  be  wr i tten  i n  the  form 

(3-D  L  -  A({pcd},{7t})B({7t},{3c}). 

Hence,  for  fixed  values  of  the  y's  and  0's,  L  is  maximized  with 
respect  to  the  p's  by  maximizing  the  factor  A.  But 

a  -  n  £Pcd"cd. 

c-1  d-1 

The  ncd  are  determined  by  the  7's.  So  from  the  usual  multinomial 
model,  it  follows  that  maximum  likelihood  estimation  of  the  p's,  for  fixed 
values  of  the  other  parameters,  is  given  by  taking  the  estimate  of  pcd  to  be 
(3-2)  ncd/nc  • 

where 

nc  —  nc  |  ^  nC2  ^  ^ck  * 

Further,  given  the  p's  and  7's,  the  estimates  of  the  distributional 
parameters  —  the  B's/~  are  easy  to  obtain  because  the  observations 
have  been  sorted  into  k  groups.  This  suggests  the  following  algorithm. 

Step  0.  Set  the  B's  at  initial  values,  perhaps  suggested  by  previous 

knowledge  of  the  phenomenon  under  study.  Set  the  p's  at  initial 
values,  e.g.,  1/k.  Set  f  (y^)  at  initial  values, 

e.fl.,  f(7i)  ■  1/k,  for  7 j  ■  1,...,  k. 

Step  1 .  Estimate  7|  by  maximizing  f (7j) f (xj |7j) . 

Step  2.  For  t  ■  2,...,  n,  estimate  7t  by  maximizing  the 
current  estimate  of 

P7t_i7t  ^ ^xt l^t * xt- 1 • • • • »x l) * 

as  the  likelihood  can  be  expressed  as  a  product  of  such  factors. 
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Step  3.  Now,  having  labeled  the  observations,  estimate  the  distributional 
parameters,  and  estimate  the  transition  probabilities  according 
to  (3.2). 

Step  4.  If  no  observation  has  changed  labels  from  the  previous  iteration, 
stop.  Otherwise,  repeat  the  procedure  from  Step  1. 

This  method  of  maximizing  with  respect  to  one  set  of  variables,  while 
the  others  remain  fixed,  then  maximizing  with  respect  to  the  second  set 
while  the  first  remain  fixed,  etc.,  is  a  relaxation  method. 

Step  2  is  8ayesian  classification  of  xt.  Suppose  the  (t-l)-st 
observation  had  been  tentatively  classified  into  class  c.  Then  the 
prior  probability  that  the  t-th  observation  belongs  to  cl<  d  is  pcd, 
d  *  l,...,k.  Hence  all  the  techniques  for  classification  particular 
models  are  available  (e.g.,  use  of  linear  discriminant  fur  j  s  when 
the  observations  are  multivariate  normal  with  common  covariance 
matrix).  / 

Since  the  labels  are  treated  as  random  and  information  equivalent 
to  a  prior  distribution  is  put  in,  one  might  more  properly  term  this  a 
procedure  of  maximum  a  posteriori  estimation,  rather  than  maximum 
likelihood  estimation. 

Within  each  iteration  Step  2  is  the  Viterbi  algorithm  (see  [6]), 
a  recursive  optimal  solution  to  the  problem  of  estimating  the  state 
sequence  of  a  discrete-time  finite  state  Markov  process.  In  the  present 
context  it  obtains  the  most  probable  sequence  of  labels,  conditionally 
upon  the  results  of  Steps  0  and  1. 
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3.2.  The  first  iteration 

When  the  k  class-conditional  processes  consist  of  independent, 
identically  distributed  normally  distributed  random  variables  with 
common  variance,  one  can  start  by  choosing  initial  means  and  labelling 
the  observations  by  a  minimum-distance  clustering  procedure.  (This  is 
one  iteration  of  ISODATA  [2];  one  could  iterate  further  at  this 
stage.)  From  this  clustering  initial  estimates  of  transition 
probabilities  and  the  variance  are  obtained.  This  starting  procedure 
could  also  be  used  for  fitting  AR  models  by  taking  the  initial  values 
of  the  autoregression  coefficients  as  zero. 


3.3.  Estimation  at  the  boundary 

In  Step  1  the  label  is  estimated  from  Xj,  without  using 

even  the  neighboring  Xj.  Effects  of  possible  error  in  estimating  -/i 
will  be  mitigated  as  processing  continues  on  toward  t  ■  n.  In  view  of 
this,  a  way  to  mitigate  further  these  effects  is  to  "backcast,"  running 
every  other  iteration  backwards.  (This  is  possible  since  Markov  chains 
are  reversible.)  Another  approach  would  be  to  run  the  algorithm  k 
times,  once  with  each  possible  value  of  y\,  and  choose  the  best 
result.  The  results  reported  below,  however,  were  obtained  simply 
using  Step  1,  as  is. 


3.U.  Restrictions  on  the  transitions 


As  mentioned  above,  one  might  wish  to  place  restrictions  on  the 
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transitions,  e.g.,  to  allow  transitions  only  to  adjacent  states. 

(E.q. .  "recovery"  is  adjacent  to  "recession",  "expansion"  is  adjacent 
to  "recovery,"  hut  "expansion"  is  not  adjacent  to  "recession.")  The 
model  does  permit  restrictions  on  the  transitions.  The  maximization  is 
conducted,  subject  to  the  condition  that  the  corresponding  transition 
probabilities  are  zero.  This  is  easily  implemented  in  the  algorithm. 

If  initially  one  sets  a  given  transition  probability  at  zero,  the 
algorithm  will  fit  no  such  transitions,  and  consequently  the 
corresponding  transition  probability  will  remain  zero  at  every 
i teration. 

4.  An  Example 

Here,  in  the  context  of  a  specific  numerical  example,  the  problems 
of  (1)  fitting  the  model  for  a  fixed  k  and  (2)  choosing  k  will 
be  discussed.  / 

The  data.  Quarterly  gross  national  product  (GNP)  in  current  (i-e., 
non-constant)  dollars  for  the  years  1946  to  1982  was  considered.  The  data 
are  given  in  Table  1.  They  are  quarterly,  but  scaled  up  to  an  annual 
basis.  The  notation  1946-1  denotes  the  first  quarter  of  1946;  1946-2, 
the  second  quarter  of  1946;  etc.  The  time  series  will  be  denoted  by 

yt,  t  ■  1,2,...,! 46 • 

Thus,  yj  is  GNP  for  1946-1,  's  GNP  for  1946-2,  etc.  The 

datum  for  1 970—3  is  1003*6,  or  just  over  1000.  This  means  that  in  the 

third  quarter  of  1970  the  economy  was  producing  goods  and  services  at  a 
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rate  of  just  over  one  trillion  (1000  billion)  dollars  per  year.  (Since 
the  readership  of  this  journal  is  international,  it  is  worth  mentioning 
that  in  this  paper  one  million  means  a  thousand  thousands;  a  billion  is 
a  thousand  millions;  a  trillion  is  a  thousand  billions.) 

Choice  of  a  transformation.  In  the  context  of  the  linear 
statistical  model 

yt  “  «  ♦  3;x1t  +  ...  +  0pxpt  +  ut, 
where  y  is  a  dependent  variable,  the  x's  are  explanatory  variables, 
and  u  is  noise.  Box  and  Cox  [3]  developed  a  method  for  choosing  a 
transformation  from  among  the  power  transformations 

y(X)  .  (yX  _  l)/(Xyg>m #X-1)  .  X  *  0, 

"  yg.m.,n<y>*  x  -  0. 

Here  yg>m<  denotes  the  geometric  mean  of  yt,  t»\,2 . n.  The 

4 

value  X  «  2  corresponds  to  the  square,  1  to  no  transformation,  0.5  to 
the  square  root,  0  to  the  log,  and  -1  to  the  reciprocal.  One  proceeds 
by  fitting  linear  models 

yt(X)  .  +  01  W  X]t  +  .  .  .  +  8p  M  xpt  +  UtM 

for  various  values  of  X,  say,  for  example,  X  ■  -1  to  2  in  steps  of 

0.5.  for  any  fixed  value  of  X,  this  is  just  an  ordinary  least  squares 
analysis  for  the  data  yt^,  t  ■  l,...,n.  An  assumption  is  that, 

for  the  true  value  of  X,  the  linear  model  holds  with  the  ut M 

at  least  approximately  normally  distributed  with  constant  standard 
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deviation  ouM.  Maximum  likelihood  estimation  of  X  reduces  to 
comparison  of  the  residual  sums  of  squares  RSS(X)  for  various  X: 

RSS(X)  -  2  [ytM  -  pred.val .  of  yt<x>]2, 
t-1 


where 


pred.val.  of  yt^  *  a M  +  bjMxjt  +  ...  +  bpMxpt, 


a ^  and  bj  ^ ,  j  *  l,...,p,  being  the  maximum  likelihood 
(least  squares)  estimates  of  and  bj  M ,  j  ■  l,...,p.  A  95* 

confidence  interval  for  X  [4,  pp.  239“240]  is 

(X:  RSS(X)  <  minxRSS(X)  [1  ♦  t2  (>-;  .025) /*]} 

where  t  ; .025)  denotes  the  upper  97*5  percentage  point  of  Student's 
t-distribution  with  v  degrees  of  freedom,  and  v  -  n  -  (p+1) ,  the 

number  of  degrees  of  freedom  for  error.  When  v  is  large,  as  is  the 

/ 

case  with  applications  to  time  series,  t(r;.025)  is  close  to  its 
asymptotic  value  of  I.96.  The  choice  of  95*  is  conventional  but 
somewhat  arbitrary.  For  a  90*  interval  when  v  is  large  one  would  use 
t  (r; .05) ,  which  for  large  v  is  approximately  1.645. 

This  method  was  applied  to  time  series  by  means  of  autoregression, 
taking  the  x's  in  the  linear  model  to  be  lagged  versions  of  y.  Eight 
lags  of  y  were  used.  The  value  8  was  chosen  to  incorporate  the  direct 
effects  of  lagged  variables  involved  in  anticipated  regular  or  seasonal 
differencing  of  order  one  or  two  and  regular  or  seasonal  autoregression 
of  order  one  or  two.  For  example,  a  second-order  autoregression  of 
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the  first  differences 

2t  "  *t  -  Tt-1 

takes  the  form 

2t  -  a  +  0,2^  +  022t_2  +  ut. 

wh i ch  is 

Yt  “  Yt-1  m  a  +  3l(Yt-1  '  Yt-2>  +  e2^t-2  "  Yt-3>  +  ut» 

or 

yt  -  a  +  (0,  +  1)  yt_i  +  (02  -  Sl)Yt-2  "  02Yt-3  +  ut* 

which  is  a  special  case  of 

yt  -  a  +  ^yt-i  +  d>2yt-2  +  ^Yt-3  +  ut* 

wi  th 

<b\  m  0]  +  1*  <t>2  “  ®2  “  01  •  an<*  <^3  “  “02* 

Due  to  the  use  of  8  lags,  the  value  of  n  for  this  regression  analysis 
was  146  -  8  ■  138,  and  v  •  138  -  (8+1)  •  129-  The  RSS  for  y  itself 
was  29,273*  This  is  equal  to  RSS(l).  The  RSS  for  log(y)  was  0.00316. 
so,  letting  log  denote  common  logs  and  In  denote  natural  logs,  one 
has 

RSS  (0)  -  RSS  for  yg>m.ln(y) 

■  RSS  for  yg.n,Jn(10)  log(y) 

*  (yg.m. In (1Q) ) 2rSS  for  log (y) . 

•  C (711  .<»50)  (2.3026)]20.00316 

■  8 , 480 . 

A  limit  for  the  953  confidence  interval  is  given  by 
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minxRSS(X)  [1  +  1.962/1293  -  1 .0298m inxRSS  (X) . 

Computations  have  been  done  only  for  X  -  0  and  1,  so  the  X  yielding 
minxRSS(X)  has  not  been  located.  However,  since  the  focus  here  is 
merely  on  choosing  between  X  ■  0  and  X  ■  1,  one  can  proceed  as  follows. 

One  notes  that  the  confidence  interval  is  given  by  a  maximum  acceptable 
value  of  RSS  (X) .  This  maximum  acceptable  value  is  less  than 
1 .0298RSS  (0) ,  which  equals  (1 .0298)  (8,480) ,  or  8,732.  The 
confidence  interval 

{X:  RSS  (X)  <  8,732.1 

based  on  this  limit  is  conservative,  in  the  sense  that  it  includes  more 
X-values  than  may  be  necessary.  Values  that  are  excluded  by  this 
interval  would  also  be  excluded  by  the  one  based  on  minxRSS(X).  Note 
in  particular  that  X  ■  0,  corresponding  to  no  transformation,  is 
excluded.  The  log-transformed  data  wilt  be  used  in  what  follows. 

Box-Jenkins  analysis.  The  main  focus  of  this  paper  is  on  the 
segmentation  of  the  time  series,  but  as  a  preliminary  a  Box-Jenkins 
analysis  will  be  presented.  Such  an  analysis  aids  with  the  choice  of 
variable  (difference,  second  difference,  etc.)  for  segmentation. 

"Box-Jenkins  analysis"  refers  to  the  fitting  of  data  with  one  or  another 
model  chosen  from  the  Box-Jenkins  ARIMA  models.  "ARIMA"  means  "integrated 
autoregressive  moving  average".  A  fuller  notation  is  ARIMA (p,d,q) , 
where  p  is  the  order  of  the  autoregression,  d  is  the  order  of 
differencing,  and  q  is  the  order  of  the  moving  average  part  of  the 
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model.  Systematic  treatments  of  Box-Jenkins  analysis  include  (in  order 
of  decreasing  mathematical  level)  [5],  [11].  [10].  and  [8]. 

Nelson  [11]  analyzed  quarterly  GNP  for  the  twenty  years  1947  to 
1966.  He  used  an  AR 1  (1.1)  model,  that  is,  he  fit  a  first-order 
autoregression  to  the  first  differences.  (The  notation  AR  means 
"autoregressive."  The  notation  AR I  means  "integrated  autoregressive;1' 

Ue. ,  ARI  (p,d)  means  that  the  d-th  differences  are  AR(p).) 

Here  the  mixed  second  differences  of  the  logarithms  were  analyzed. 

A  plot  of  the  first  seasonal  differences  yt  -  yt_j,»  corresponding 
to  the  annual  velocity  of  the  economy,  still  seemed  to  trend  upward.  So 
did  a  plot  of  the  regular  differences  yt  -  yt_j,  which  correspond 
to  the  quarterly  velocity  of  the  economy.  Hence  second  differences  were 
considered.  The  regular-seasonal  mixed  second  differences 

(Tt  -Tt-1>  *  <Tt-4  ‘  Tt-5} 

/ 

appeared  stationary.  Second  differences,  corresponding  to  acceleration, 
provide  a  not  unnatural  way  of  looking  at  the  data. 

The  Mini  tab  computing  system  (see  [13])  was  used  for  the 
analysis.  In  fact,  the  transformation,  differencing  and  plotting 
already  referred  to  were  done  using  Mini  tab.  The  arithmetic  average 
of  the  common  logarithms  of  the  data  is  2.8522.  Their  geometric 
mean  is  711*450.  A  model  allowing  for  first-order  regular 
autogression  and  first-order  seasonal  autoregression  was  fit.  In  the 
Box-Jenkins  notation  for  seasonal  models, 


ARIMA  (p.d.q,)  (P,D,Q)S, 
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this  is 

AR I MA  ( 1 , 1 , 0)  (1, 1,0)*. 

In  general,  S  denotes  the  seasonality,  P,  D,  and  Q  the  orders  of 
seasonal  autoregression,  differencing,  and  moving  average. 

The  value  of  the  estimate  of  the  regular  autoregression  coefficient 
was  0.4276;  that  of  the  seasonal  autoregression  coefficient  was  -0.5332. 

The  value  of  the  estimate  of  the  constant  in  the  model  was  -0.0001405. 

The  residual  sum  of  squares  was  0.00602478. 

Check  on  the  constant  term.  The  model  was  also  fit  without  a 
constant  term  in  the  model.  The  value  of  the  estimate  of  the  regular 
autoregression  coefficient  was  0.4275;  that  of  the  seasonal 
autoregression  coefficient  was  -0.5535*  The  residual  sum  of  squares 
was  0.00602508. 

Model  selection  criteria  will  be  used  in  several  ways  in  this  paper. 

At  this  point  their  use  will  be  illustrated  with  the  decision  of  whether 
to  retain  the  constant  in  the  model.  First  some  general  remarks  on 
model -select ion  criteria  will  be  made. 

Model  selection  criteria  are  figures  of  merit  for  alternative 
models.  That  model  which  optimizes  the  criterion  is  chosen.  One  such 
criterion  is  Akaike's  information  criterion  (AIC)  .  (See,  e.g.,  CO*) 

Suppose  there  are  K  alternative  models  M^,  k  -  1,...,K. 

The  model  chosen  is  the  one  which  minimizes  AIC(k),  where 
AIC(k)  «  -2  In  [max  L  (k)  ]  +  2c(k). 

Here  L  (k)  is  the  likelihood  when  Ml  is  the  model,  max  denotes 
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its  maximum  over  the  parameters,  and  c(k)  is  the  number  of  independent 
parameters  when  Mk  is  the  model. 

The  statistic  AIC(k)  is  a  natural  estimate  of  the  "cross-entropy" 

(see  [12])  between  f  and  g(k),  where  f  is  the  (unknown) 
true  density  and  g(k)  is  the  density  corresponding  to  the  model 
Mk.  According  to  A  1C,  inclusion  of  an  additional  parameter  is 
appropriate  if  in [max  L]  increases  by  one  unit  or  more,  l>e. ,  if 
max  L  increases  by  a  factor  of  e  or  more.  Schwarz1  model -selection 
criterion  ([lk],  [7]), 

-2  In  [max  L(k)]  +  ln(n)c(k), 

being  derived  from  a  first-order  approximation  to  the  posterior  probability 

of  Mk,  enjoys  certain  advantages.  Note  that  both  AIC  and  Schwarz1 

criterion  are  of  the  form 

-2  ln[max  l(k)]  +  a(n)c(k), 

* 

where  a (n)  ■  ln(n)  for  Schwarz1  criterion  and  a(n)  •  2  for  AIC. 

According  to  Schwarz*  criterion,  an  additional  parameter  will  be  included 
if  it  increases  In  (max  L)  by  an  amount  greater  than  ln(n)/2,  that  is, 
if  max  L  increases  by  a  factor  of  square  root  of  n  or  more. 

In  particular,  for  n  at  least  8,  Schwarz*  criterion  favors  models  with 
fewer  parameters,  relative  to  AIC. 

Note  that  for  Gaussian  models 

-2  In  (max  L  (k) )  ■  n  In  (2*)  +  n  ln(v(k))  +  n, 

where  v(k)  is  the  maximum  likelihood  estimate  of  the  error  variance 
in  the  model  Mk:  v(k)  *  RSS(k)/n,  where  RSS  (k)  is  the 
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residual  sum  of  squares  in  fitting  the  model  M^.  in  terms  of 
RSS(k),  this  is 

-2  In  (max  L(k))  -  n  ln(2ir)  +  n  ln(RSS(k))  -  n  ln(n)  +  n. 

This  gives 

-2  In  (max  L(k))  +  a  (n)  c  (k) 

»  n  In  (2m)  +  n  ln(RSS(k))  -  n  ln(n)  +  n  +  a(n)c(k). 

To  compare  models,  it  suffices  to  compute  only  the  portion  depending 
upon  k,  namely,  the  statistic 

n  ln(RSS(k))  +  a(n)c(k). 

To  apply  model -set action  criteria  to  decide  whether  to  include 
a  constant  term  in  the  model,  one  takes  K*2,  corresponding  to  two  models, 
one  with  the  constant  (say  k  ■  1)  and  the  other  without  the  constant 
(k  »  2) .  One  has  n  «  146  -  5  •  141,  due  to  the  regular  and  seasonal 
differencing.  This  gives 

n  1  n  (RSS  (k) )  +  a(n)c(k)  -  141  ln(RSS(k))  +  a  (141)  c  (k)  . 

Here  AIC  will  be  used;  it  is  favorable  to  inclusion  of  more  parameters 
so  if  AIC  rejects  the  constant,  then  Schwarz1  criterion  would  also. 

For  AIC,  a(n)  ■  2,  so  the  statistic  becomes  141  ln(RSS(k))  +  2c  (k)  . 

For  k  »  1  (model  with  constant  term)  this  is  141  In (0.00602478)  +■  2(4), 

counting  the  number  of  parameters  as  four  (regular  and  seasonal 
autoregression  coefficients,  constant,  and  error  variance) .  This  is 
equal  to  141  (-5. 1 1 18876)  +  8  -  -712.776.  For  k  -  2  (model  without 
constant  term)  this  is  141  In (0.00602508)  +  2(3),  the  number  of 
parameters  being  3  instead  of  4  due  to  the  omission  of  the  constant. 
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This  is  1 4 1  (~5* 1 1 18378)  +  6  ■  —7^4. 769 •  which  is  less  than  the  value 
of  -712.776  obtained  for  the  model  with  the  constant.  (Note  that  the 
difference  714. 769  -  712.776  ■  1.993  is  essentially  all  due  to  the 
difference  of  2  associated  with  the  difference  in  number  of  parameters. 

The  very  slight  improvement  in  residual  sum  of  squares  associated  with 
the  fitting  of  the  additional  parameter  is  more  than  offset  by  the  use 
of  an  additional  parameter.)  Hence  one  concludes  that  the  constant  may 
be  excluded.  Note  that  this  decision  is  made  without  any  choice  of 
arbitrary  level  of  significance,  such  as  (Rational  choice  of  level 

of  significance  involves  simultaneous  consideration  of  the  power  of  the 
test,  and  power  computations  can  be  rather  involved.  In  any  case,  most 
practitioners  seem  either  unwilling  or  unable  to  do  them.) 

Segmentation  analysis.  The  values  of  the  differences  and 
second  differences  for/1950  are  strikingly  higher  than  those  for 
earlier  and  later  years.  On  plots  these  observations  appear  to  be 
"outliers."  They  locate  very  well  the  mobilization  of  the  economy 
at  the  onset  of  the  Korean  conflict.  The  need  for  segmentation  of 
the  time  series  is  apparent.  The  segmentation  analysis  will  be 
performed  on  the  mixed  regular-seasonal  second  differences,  as  these 
appear  to  be  stationary. 

k. 1 .  Fitting  the  model 

In  this  section  the  fitting  of  a  model  with  k  ■  3  classes  is 
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treated,  discussion  of  the  choice  of  k  being  deferred  to  the  next 
section.  The  three  classes  may  be  considered  as  corresponding  to 
Recession,  Recovery,  and  Expansion,  although  some  may  prefer  to  think 
of  the  segments  labeled  as  Recovery  as  level  periods  corresponding  to 
peaks  and  troughs.  The  approximate  maximum  likelihood  solution  found 
by  the  iterative  procedure  was  (units  are  billions  of  current 
(non-constant)  dollars)  -0.01125,  0.00184,  and  0.01780  for  the  means, 

4.202  x  10~3  for  the  standard  deviation,  and 


.4167 

•  5556 

.0278 

.2151 

•  7312 

.0538 

.0000 

.5455 

.4545 

for  the  transition  probability  matrix. 

Remember  that  the  input  to  the  segmentation  procedure  was  the 

mixed  regular-seasonal  second  difference  of  the  common  logs.  If  the 

* 

value  of  this  variable  equals  x,  then 

yt  -  ,ox(yt-4/yt-5)yt-i. 

For  example,  if  Yt-u/Yt-5  *  '»  th's  gives  yt  »  1.047yt_j  if  x  -  0.02, 
yt  ■  1.0046yt_j  if  x  -  0.002,  and  yt  •  0.977yt-i  if  x  ■  -0.01. 

The  estimated  labels  are  given  in  Table  2;  labels  (1,  2,  3,  4,  5) 
resulting  from  fitting  k  ■  5  classes  (discussed  below)  are  also  given. 
The  process  was  in  state  1  for  2(A  of  the  time,  state  2  for  66%  of  the 
time,  and  state  3  for  8*  of  the  time. 

The  conventional  wisdom  regarding  recessions  during  the  period  of 
time  covered  by  these  data  is  as  follows.  (See,  e.g.,  [9],  pp. 
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209-211.)  In  1 9^*8—  1 9^*9  there  was  a  reduction  of  inventory  investment. 

In  1953" 195^  there  was  a  reduction  in  government  expenditures  when  the 
Korean  conflict  came  to  a  close.  In  mid-1957  to  late  1958  an  ongoing 
recession  was  aggravated  by  a  drop  in  defense  expenditures  in  late 
1957.  In  i960  monetary  and  fiscal  authorities  had  put  on  the  brakes; 
interest  rates  had  risen  substantially  during  1 958  and  1959*  Readers 
can  probably  remember  some  more  recent  recessions. 

An  interesting  feature  of  the  model  and  the  algorithm  is  that,  as 
the  iterations  proceed,  some  isolated  labels  change  to  conform  to  their 
neighbors.  This  should  be  the  case  when  pcc  is  large  relative  to 
Pcd*  d  *  c. 

4.2.  Choice  of  number  of  classes 

Various  values  of  k  were  tried,  the  results  being  scored  by 
means  of  Aka  ike's  and  Schwarz1  mode  I -select  ion  criteria. 

The  results  are  given  in  Table  3*  The  best  segmentation  model,  as 
indicated  by  minimum  value  of  Schwarz*  criterion,  is  that  with  five 
classes.  (It  may  be  possible  to  associate  these  in  some  way  with 
Recession,  Trough,  Recovery,  Expansion,  and  Peak.)  AIC  would  choose  7 
classes. 

9.  Extensions 

The  segmentation  procedure  has  been  illustrated  here  for  the 
univariate  case,  and  with  an  assumption  of  common  variance. 

Class-specific  variances  can  be  allowed.  One  can  use  model -select ion 
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criteria  to  decide  whether  or  not  to  use  separate  class  variances. 

Multiple  time  series  can  be  treated.  Again,  one  can  use  model 
selection  criteria  to  decide  whether  or  not  to  use  separate  class 
covariance  matrices.  Computer  programs  to  perform  these  analyses  have 
already  been  written  by  the  author. 

Here  we  fit  only  the  independent,  identically  distributed  model 
within  segments.  An  extension  will  be  the  fitting  of  Box-Jenkins 
models  within  segments. 

Though  the  segmentation  method  presented  is  general,  the  focus 
here  has  been  on  Gaussian  data.  There  are  other  important  particular 
cases.  In  epidemiology,  one  might  wish  to  segment  series  for 
which  the  observed  variable  X  is  a  discrete  count.  In  sampling  by 
attribute  in  industrial  quality  control  X  is  binary.  One  might 
wish  to  segment  the  output  stream  according  to  classes,  "in 
control,"  "close  to  control,"  "out  of  control,"  and  estimate  the 
proportion  of  defectives  in  these  classes. 
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TABLE  1.  Quarterly  GNP,  1946-1  through  1982-2. 

Units:  billions  of  current  (non-constant)  dollars 
(Time  Series  #534,  National  Bureau  of  Economic  Research,  from 
BCD:  Business  Cycle  Developments,  U.S.  Department  of  Commerce) 


1946-1 

197.7 

1946-2 

205.3 

1946-3 

215.6 

1946-4 

220.7 

1947-1 

225.1 

1947-2 

229.3 

1947-3 

233.6 

1947-4 

244.0 

1948-1 

250.0 

1948-2 

257.5 

1948-3 

264.5 

1948-4 

265.9 

1949-1 

260.5 

1949-2 

257.0 

1949-3 

258.9 

1949-4 

256.8 

1950-1 

267.6 

1950-2 

277.1 

1950-3 

294.8 

1950-4 

306.3 

1951*1 

320.4 

1951-2 

328.3 

1951-3 

335-0 

1951-4 

339-2 

1952-1 

341.9 

1952-2 

342.1 

1952-3 

347.8 

1952-4 

360.0 

1953-1 

366.1 

1953-2 

369.4 

1953-3 

368.4 

1953-4 

363.1 

1954-1 

362.5 

1954-2 

362.3 

1954-3 

366.7 

1954-4 

375.6 

1955-1 

388.2 

1955-2 

396.2 

1955-3 

404.8 

1955-4 

411.0 

1956-1 

412.8 

1956-2 

418.4 

1956-3 

423-5 

1956-4 

432.1 

1956-1 

440.2 

1956-2 

442.3 

1956-3 

449.4 

1956-4 

444.0 

1958-1 

436.8 

1958-2 

440.7 

1958-3 

453.9 

1958-4 

467-0 

1959-1 

477.0 

1959-2 

490.6 

1959-3 

489.0 

1959-4 

495.0 

1960-1 

506.9 

1960-2 

506.3 

1960-3 

508.0 

1960-4 

504.8 

1961-1 

503.2 

1961-2 

519.2 

1961-3 

528.2 

1961-4 

542.6 

1962-1 

554.2 

1962-2 

562.7 

1962-3 

568.9 

1962-4 

574.3 

1963-1 

582.0 

1963-2 

590.7 

1963-3 

601 .8 

1963-4 

612.4 

1964-1 

625.3 

1964-2 

634.0 

1964-3 

642.8 

1964-4 

648.8 

1965-1 

668.8 

1965-2 

681.7 

1965-3 

696.4 

1965-4 

717.2 

1966-1 

738.5 

1966-2 

750.0 

1966-3 

760.6 

1966-4  . 

774.9 

1967-1 

780.7 

1967-2 

788.6 

1967-3 

805.7 

1967-4 

823.3 

1968-1 

841.2 

1968-2 

867.2 

1968-3 

884.9 

1968-4 

900.3 

1969-1 

921.2 

1969-2 

937.4 

1969-3 

955-3 

1969-4 

962.0 

1970-1 

972.0 

1970-2 

986.3 

1970-3 

1003.6 

1970-4 

1009.0 

1971-1 

1049.3 

1971-2 

IO68.9 

1971-3 

1086.6 

1971-4 

1 105.8 

1972-1 

1142.4 

1972-2 

1171.7 

1972-3 

1196.1 

1972-4 

1233.5 

1973-1 

1283.5 

1973-2 

1307.6 

1973-3 

1337.7 

1973-4 

1376.7 

1974-1 

1387.7 

1974-2 

1423-8 

1974-3 

1451 .6 

1974-4 

1473-8 

1975-1 

1479.8 

1975-2 

1516.7 

1975-3 

1578.5 

1975-4 

1621.8 

1976-1 

1672.0 

1976-2 

1698.6 

1976-3 

1729.0 

1976-4 

1772.5 

1977-1 

1834.8 

1977-2 

1895.1 

1977-3 

1954.4 

1977-4 

1988.9 

1978-1 

2031.7 

1978-2 

2139.5 

1978-3 

2202.5 

1978-4 

2281 .6 

1979-1 

2335-5 

1979-2 

2337.9 

1979-3 

2454.8 

1979-4 

2502.9 

1980-1 

2575.9 

1980-2 

2573.4 

1980-3 

2643.7 

1980-4 

2739.4 

1981-1 

2864.9 

1981-2 

2901 .8 

1981-3 

2980.9 

1981-4 

3003.2 

1982-1 

2995.5 

1982-2 

3041 .2 
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TABLE  2.  Estimated  labels 

(47-2  denotes  the  second  quarter  of  1947;  47-3.  the  third 
quarter  of  1947.  etc.) 


Quarter:  47-2  1*7-3  47-4  48-1  48-2  48-3  48-4  49-1  49-2  49-3  49-4  50-1 


label. k-3:  11222 

label. k-5:  21433 

50-2  50-3  50-4  51-1  51-2  51-3  51-4  52- 
3  3  3  2  2  1  1  1 

5  5  5  3  2  111 


54-2  51,-3  54-1,  55-1  55-2  55-3  55-4  56- 
1  2  3  3  2  2  2  1 

2  3  5  5  4  3  2  1 

58-2  58-3  58-4  59-1  59-2  59-3  59-4  60- 
2  2  3  3  2  1  12 

3  3  5  5  4  1  2  3 

62-2  62-3  62-4  63-1  63-2  63-3  63-4  64- 
2  2  1  2  2  2  2  2 

2  2  2  2  3  3  3  3 

66-2  66-3  66-4  67- 1  67-2  67-3  67-4  68-1 
2  2  2  1  >2  2  2  2 

2  2  2  2  2  3  3  3 

70-2  70-3  70-4  71-1  71-2  71-3  71-4  72-1 
22232222 
2  3  3  4  3  3  3  2 

74-2  74-3  74-4  75-1  75-2  75-3  75-4  76-1 
2  2  12  2  2  2  3 

32223434 

78-2  78-3  78-4  79-1  79-2  79-3  79-4  80-1 
2  2  2  2  1  2  1  2 

4  3  4  3  1  3  2  3 


2  111113 

3  1112  2  5 

52-2  52-3  52-4  53-1  53-2  53-3  53-4  54-1 
1  2  2  2  2  1  1  1 

1  2  4  3  3  2  1  2 


56-2  56-3  56-4  57-1  57-2  57-3  57-4  58-1 
2  2  2  2  2  2  1  1 

2  2  3  3  2  3  1  1 

60-2  60-3  60-4  61-1  61-2  61-3  61-4  62-1 
12  1  12  2  3  2 

1  3  2  2  4  3  4  3 

64-2  64-3  64-4  65- 1  65-2  65-3  65-4  66-1 

2  2  2  2  2  2  2  2 

3  2  2  3  3  3  4  3 

68-2  68-3  68-4  69- 1  69-2  69-3  69-4  70- 1 
2  2  2  2  1  2  2  1 

4  3  2  3  2  3  2  2 

72-2  72-3  72-4  73-1  73-2  73-3  73*4  74-1 
2  2  2  2  2  2  2  1 

3  3  3  3  2  3  3  1 

76-2  76-3  76-4  77-1  77-2  77-3  77-4  78- 1 
2  1  2  2  2  2  2  1 

22334322 

80-2  80-3  80-4  81-1  81-2  81-2  8l-4  82-1 
1  .  2  2  2  2  2  1  1 

2  2  4  3  3  3  1  1 


82-2 

2 

3 
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TABLE  3*  Fitting  models 


Number  of  classes,  k  Aka  ike's  criterion  Schwarz1  criterion 


2 

912.3 

927.1 

3 

825.0 

85*.  5 

4 

7*9.8 

800.0 

5 

715.8 

792.5* 

6 

696.4 

805.5 

7 

664.8* 

812.3 

8 

670.9 

862.5 

9 

671.O 

912.8 

/ 


*  denotes  minimum 
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The  problem  of  partitioning  time-series  Into  segments  is  treated. 
The  segments  are  considered  as  falling  Into  classes.  A  different 
probability  distribution  is  associated  with  each  class  of  segment. 
Parametric  families  of  distributions  are  considered,  a  set  of  parameter 
values  being  associated  with  each  cljss.  With  each  observation  Is 
associated  an  unobservable  label.  Indicating  from  which  class  the 
observation  arose.  The  label  process  Is  modeled  as  a  Markov  chain. 
Segmentation  algorithms  are  obtained  by  applying  a  relaxation  method  to 
maximize  the  resulting  likelihood  function.  In  this  paper  special 
attention  Is  given  to  the  situation  In  which  the  observations  are 
conditionally  Independent,  given  the  labels.  A  numerical  example, 
segmentation  of  U.S.  Gross  National  Product,  is  given.  Choice  of  the 
number  of  classes,  using  statistical  model -selection  criteria. 

Is  Illustrated. 
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